spam classifier
SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection
Li, Yekai, Zhang, Rufan, Rong, Wenxin, Mi, Xianghang
In this study, we introduce SpamDam, a SMS spam detection framework designed to overcome key challenges in detecting and understanding SMS spam, such as the lack of public SMS spam datasets, increasing privacy concerns of collecting SMS data, and the need for adversary-resistant detection models. SpamDam comprises four innovative modules: an SMS spam radar that identifies spam messages from online social networks(OSNs); an SMS spam inspector for statistical analysis; SMS spam detectors(SSDs) that enable both central training and federated learning; and an SSD analyzer that evaluates model resistance against adversaries in realistic scenarios. Leveraging SpamDam, we have compiled over 76K SMS spam messages from Twitter and Weibo between 2018 and 2023, forming the largest dataset of its kind. This dataset has enabled new insights into recent spam campaigns and the training of high-performing binary and multi-label classifiers for spam detection. Furthermore, effectiveness of federated learning has been well demonstrated to enable privacy-preserving SMS spam detection. Additionally, we have rigorously tested the adversarial robustness of SMS spam detection models, introducing the novel reverse backdoor attack, which has shown effectiveness and stealthiness in practical tests.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Africa > Nigeria > Jigawa State > Dutse (0.04)
- Information Technology > Services (1.00)
- Information Technology > Security & Privacy (1.00)
Classification by sparse additive models
We consider (nonparametric) sparse additive models (SpAM) for classification. The design of a SpAM classifier is based on minimizing the logistic loss with a sparse group Lasso/Slope-type penalties on the coefficients of univariate additive components' expansions in orthonormal series (e.g., Fourier or wavelets). The resulting classifier is inherently adaptive to the unknown sparsity and smoothness. We show that under certain sparse group restricted eigenvalue condition it is nearly-minimax (up to log-factors) simultaneously across the entire range of analytic, Sobolev and Besov classes. The performance of the proposed classifier is illustrated on a simulated and a real-data examples.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Real-World, Man-Machine Algorithms
Behind the scenes, the same call automatically and invisibly decides whether a machine learning classifier is reliable enough to classify the example on its own, or whether human intervention is needed. Models get built automatically, they're continually retrained, and the caller never has to worry whether more data is needed. In the rest of this article, we'll go into more detail on the problems we described above--problems that are common to all efforts to deploy machine learning to solve real-world problems. In order to train any spam classifier, you'll first need a training set of "spam" and "not spam" labels.
- Leisure & Entertainment (0.48)
- Information Technology > Services (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Document Classification with scikit-learn
Document classification is a fundamental machine learning task. It is used for all kinds of applications, like filtering spam, routing support request to the right support rep, language detection, genre classification, sentiment analysis, and many more. To demonstrate text classification with scikit-learn, we're going to build a simple spam filter. While the filters in production for services like Gmail are vastly more sophisticated, the model we'll have by the end of this tutorial is effective, and surprisingly accurate. Spam filtering is kind of like the "Hello world" of document classification. However, something to be aware of is that you aren't limited to two classes.
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Real-World, Man-Machine Algorithms
Behind the scenes, the same call automatically and invisibly decides whether a machine learning classifier is reliable enough to classify the example on its own, or whether human intervention is needed. Models get built automatically, they're continually retrained, and the caller never has to worry whether more data is needed. In the rest of this article, we'll go into more detail on the problems we described above--problems that are common to all efforts to deploy machine learning to solve real-world problems. In order to train any spam classifier, you'll first need a training set of "spam" and "not spam" labels.
- Leisure & Entertainment (0.48)
- Information Technology > Services (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)